feat: add EAGLE3 support for Step-3.5-Flash#530
Open
zijiexia wants to merge 10 commits intosgl-project:mainfrom
Open
feat: add EAGLE3 support for Step-3.5-Flash#530zijiexia wants to merge 10 commits intosgl-project:mainfrom
zijiexia wants to merge 10 commits intosgl-project:mainfrom
Conversation
Contributor
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
Author
|
The support also requested changes on the sglang side, PR raised: sgl-project/sglang#22718 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
New chat template (
specforge/data/template.py): registersstep3.5, a thinking-enabled template using<|im_start|>/<|im_end|>tokens, matching Step-3.5-Flash's format.New draft model config (
configs/step-3.5-flash-eagle3.json): EAGLE3 architecture config for Step-3.5-Flash — 1-layerLlamaForCausalLMEagle3with aux hidden states captured from layers 4, 20, 40.Training script (
examples/run_step3p5_flash_eagle3_online.sh): end-to-end online training script for EAGLE3 on Step-3.5-Flash with SGLang backend, FA3 attention, and W&B logging.smoltalk-chinese dataset (
scripts/prepare_data.py): addsprocess_smoltalk_rowand wires upzjxia/smoltalk-chineseas a supported dataset option.Fix
sglang_max_total_tokensOOM for SWA models (specforge/args.py): changedtarget_batch_size * max_lengthtoint(target_batch_size * max_length * 1.2). The 1.2× buffer is driven by three structural properties of SGLang's SWA memory allocator:alloc_paged_token_slots_extendover-reserves bybatch_size × page_sizeslots on every extend call (mem_cache/common.py:267).full_attnandswa_attn), each independently applying the same overhead check (swa_memory_pool.py:370–414).swa_full_tokens_ratio = 0.8×of the full pool, so it exhausts first. Compensating for shrinkage alone requires1/0.8 = 1.25×— but factors 1 and 2 add further overhead on top, pushing the true requirement slightly above 1.25×. In practice, because page-alignment overhead is small (~0.78% per extend batch at batch=128, page_size=16, max_length=2048), 1.2× is empirically sufficient and avoids unnecessary over-reservation of the token pool.Test plan
run_step3p5_flash_eagle3_online.shand confirm training loop starts without OOMsmoltalk-chinesedataset processes correctly viaprepare_data.py --dataset smoltalk-chinesestep3.5template tokenizes a sample conversation as expected🤖 Generated with Claude Code